Search Results

Documents authored by Oliveira Alves, Ana


Document
Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text

Authors: Alexandre Pinto, Hugo Gonçalo Oliveira, and Ana Oliveira Alves

Published in: OASIcs, Volume 51, 5th Symposium on Languages, Applications and Technologies (SLATE'16) (2016)


Abstract
Nowadays, there are many toolkits available for performing common natural language processing tasks, which enable the development of more powerful applications without having to start from scratch. In fact, for English, there is no need to develop tools such as tokenizers, part-of-speech (POS) taggers, chunkers or named entity recognizers (NER). The current challenge is to select which one to use, out of the range of available tools. This choice may depend on several aspects, including the kind and source of text, where the level, formal or informal, may influence the performance of such tools. In this paper, we assess a range of natural language processing toolkits with their default configuration, while performing a set of standard tasks (e.g. tokenization, POS tagging, chunking and NER), in popular datasets that cover newspaper and social network text. The obtained results are analyzed and, while we could not decide on a single toolkit, this exercise was very helpful to narrow our choice.

Cite as

Alexandre Pinto, Hugo Gonçalo Oliveira, and Ana Oliveira Alves. Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text. In 5th Symposium on Languages, Applications and Technologies (SLATE'16). Open Access Series in Informatics (OASIcs), Volume 51, pp. 3:1-3:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2016)


Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2016.3,
  author =	{Pinto, Alexandre and Gon\c{c}alo Oliveira, Hugo and Oliveira Alves, Ana},
  title =	{{Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text}},
  booktitle =	{5th Symposium on Languages, Applications and Technologies (SLATE'16)},
  pages =	{3:1--3:16},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-006-4},
  ISSN =	{2190-6807},
  year =	{2016},
  volume =	{51},
  editor =	{Mernik, Marjan and Leal, Jos\'{e} Paulo and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2016.3},
  URN =		{urn:nbn:de:0030-drops-60086},
  doi =		{10.4230/OASIcs.SLATE.2016.3},
  annote =	{Keywords: Natural language processing, toolkits, formal text, social media, benchmark}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail